Goto

Collaborating Authors

 autoregressive process


Counterfactual Token Generation in Large Language Models

Chatzi, Ivi, Benz, Nina Corvelo, Straitouri, Eleni, Tsirtsis, Stratis, Gomez-Rodriguez, Manuel

arXiv.org Artificial Intelligence

"Sure, I am happy to generate a story for you: Captain Lyra stood at the helm of her trusty ship, the Maelstrom's Fury, gazing out at the endless sea. [...] Lyra's eyes welled up with tears as she realized the bitter truth - she had sacrificed everything for fleeting riches, and lost the love of her crew, her family, and herself." Although this story, generated by a large language model, is captivating, one may wonder -- how would the story have unfolded if the model had chosen "Captain Maeve" as the protagonist instead? We cannot know. State-of-the-art large language models are stateless -- they maintain no internal memory or state. Given a prompt, they generate a sequence of tokens as an output using an autoregressive process. As a consequence, they cannot reason about counterfactual alternatives to tokens they have generated in the past. In this work, our goal is to enhance them with this functionality. To this end, we develop a causal model of token generation that builds upon the Gumbel-Max structural causal model. Our model allows any large language model to perform counterfactual token generation at almost no cost in comparison with vanilla token generation, it is embarrassingly simple to implement, and it does not require any fine-tuning nor prompt engineering. We implement our model on Llama 3 8B-Instruct and Ministral-8B-Instruct and conduct a qualitative and a quantitative analysis of counterfactually generated text. We conclude with a demonstrative application of counterfactual token generation for bias detection, unveiling interesting insights about the model of the world constructed by large language models.


Lag selection and estimation of stable parameters for multiple autoregressive processes through convex programming

Chakraborty, Somnath, Lederer, Johannes, von Sachs, Rainer

arXiv.org Artificial Intelligence

Motivated by a variety of applications, high-dimensional time series have become an active topic of research. In particular, several methods and finite-sample theories for individual stable autoregressive processes with known lag have become available very recently. We, instead, consider multiple stable autoregressive processes that share an unknown lag. We use information across the different processes to simultaneously select the lag and estimate the parameters. We prove that the estimated process is stable, and we establish rates for the forecasting error that can outmatch the known rate in our setting. Our insights on the lag selection and the stability are also of interest for the case of individual autoregressive processes.


Recurrent Convolutional Deep Neural Networks for Modeling Time-Resolved Wildfire Spread Behavior

Burge, John, Bonanni, Matthew R., Hu, R. Lily, Ihme, Matthias

arXiv.org Artificial Intelligence

The increasing incidence and severity of wildfires underscores the necessity of accurately predicting their behavior. While high-fidelity models derived from first principles offer physical accuracy, they are too computationally expensive for use in real-time fire response. Low-fidelity models sacrifice some physical accuracy and generalizability via the integration of empirical measurements, but enable real-time simulations for operational use in fire response. Machine learning techniques offer the ability to bridge these objectives by learning first-principles physics while achieving computational speedup. While deep learning approaches have demonstrated the ability to predict wildfire propagation over large time periods, time-resolved fire-spread predictions are needed for active fire management. In this work, we evaluate the ability of deep learning approaches in accurately modeling the time-resolved dynamics of wildfires. We use an autoregressive process in which a convolutional recurrent deep learning model makes predictions that propagate a wildfire over 15 minute increments. We demonstrate the model in application to three simulated datasets of increasing complexity, containing both field fires with homogeneous fuel distribution as well as real-world topologies sampled from the California region of the United States. We show that even after 100 autoregressive predictions representing more than 24 hours of simulated fire spread, the resulting models generate stable and realistic propagation dynamics, achieving a Jaccard score between 0.89 and 0.94 when predicting the resulting fire scar.


Understanding how Random coefficient autoregressive processes(Artificial Intelligence)

#artificialintelligence

Abstract: Many studies on biological and soft matter systems report the joint presence of a linear mean-squared displacement and a non-Gaussian probability density exhibiting, for instance, exponential or stretched-Gaussian tails. This phenomenon is ascribed to the heterogeneity of the medium and is captured by random parameter models such as "superstatistics" or "diffusing diffusivity". Independently, scientists working in the area of time series analysis and statistics have studied a class of discrete-time processes with similar properties, namely, random coefficient autoregressive models. In this work we try to reconcile these two approaches and thus provide a bridge between physical stochastic processes and autoregressive models. We start from the basic Langevin equation of motion with time-varying damping or diffusion coefficients and establish the link to random coefficient autoregressive processes.


A Finite-Sample Deviation Bound for Stable Autoregressive Processes

González, Rodrigo A., Rojas, Cristian R.

arXiv.org Machine Learning

In this paper, we study non-asymptotic deviation bounds of the least squares estimator in Gaussian AR($n$) processes. By relying on martingale concentration inequalities and a tail-bound for $\chi^2$ distributed variables, we provide a concentration bound for the sample covariance matrix of the process output. With this, we present a problem-dependent finite-time bound on the deviation probability of any fixed linear combination of the estimated parameters of the AR$(n)$ process. We discuss extensions and limitations of our approach.


Autoregressive Policies for Continuous Control Deep Reinforcement Learning

Korenkevych, Dmytro, Mahmood, A. Rupam, Vasan, Gautham, Bergstra, James

arXiv.org Artificial Intelligence

Reinforcement learning algorithms rely on exploration to discover new behaviors, which is typically achieved by following a stochastic policy. In continuous control tasks, policies with a Gaussian distribution have been widely adopted. Gaussian exploration however does not result in smooth trajectories that generally correspond to safe and rewarding behaviors in practical tasks. In addition, Gaussian policies do not result in an effective exploration of an environment and become increasingly inefficient as the action rate increases. This contributes to a low sample efficiency often observed in learning continuous control tasks. We introduce a family of stationary autoregressive (AR) stochastic processes to facilitate exploration in continuous control domains. We show that proposed processes possess two desirable features: subsequent process observations are temporally coherent with continuously adjustable degree of coherence, and the process stationary distribution is standard normal. We derive an autoregressive policy (ARP) that implements such processes maintaining the standard agent-environment interface. We show how ARPs can be easily used with the existing off-the-shelf learning algorithms. Empirically we demonstrate that using ARPs results in improved exploration and sample efficiency in both simulated and real world domains, and, furthermore, provides smooth exploration trajectories that enable safe operation of robotic hardware.


Missing Data in Sparse Transition Matrix Estimation for Sub-Gaussian Vector Autoregressive Processes

Jalali, Amin, Willett, Rebecca

arXiv.org Machine Learning

High-dimensional time series data exist in numerous areas such as finance, genomics, healthcare, and neuroscience. An unavoidable aspect of all such datasets is missing data, and dealing with this issue has been an important focus in statistics, control, and machine learning. In this work, we consider a high-dimensional estimation problem where a dynamical system, governed by a stable vector autoregressive model, is randomly and only partially observed at each time point. Our task amounts to estimating the transition matrix, which is assumed to be sparse. In such a scenario, where covariates are highly interdependent and partially missing, new theoretical challenges arise. While transition matrix estimation in vector autoregressive models has been studied previously, the missing data scenario requires separate efforts. Moreover, while transition matrix estimation can be studied from a high-dimensional sparse linear regression perspective, the covariates are highly dependent and existing results on regularized estimation with missing data from i.i.d.~covariates are not applicable. At the heart of our analysis lies 1) a novel concentration result when the innovation noise satisfies the convex concentration property, as well as 2) a new quantity for characterizing the interactions of the time-varying observation process with the underlying dynamical system.


Inference of High-dimensional Autoregressive Generalized Linear Models

Hall, Eric C., Raskutti, Garvesh, Willett, Rebecca

arXiv.org Machine Learning

Vector autoregressive models characterize a variety of time series in which linear combinations of current and past observations can be used to accurately predict future observations. For instance, each element of an observation vector could correspond to a different node in a network, and the parameters of an autoregressive model would correspond to the impact of the network structure on the time series evolution. Often these models are used successfully in practice to learn the structure of social, epidemiological, financial, or biological neural networks. However, little is known about statistical guarantees on estimates of such models in non-Gaussian settings. This paper addresses the inference of the autoregressive parameters and associated network structure within a generalized linear model framework that includes Poisson and Bernoulli autoregressive processes. At the heart of this analysis is a sparsity-regularized maximum likelihood estimator. While sparsity-regularization is well-studied in the statistics and machine learning communities, those analysis methods cannot be applied to autoregressive generalized linear models because of the correlations and potential heteroscedasticity inherent in the observations. Sample complexity bounds are derived using a combination of martingale concentration inequalities and modern empirical process techniques for dependent random variables. These bounds, which are supported by several simulation studies, characterize the impact of various network parameters on estimator performance.


Short-term time series prediction using Hilbert space embeddings of autoregressive processes

Valencia, Edgar A., Álvarez, Mauricio A.

arXiv.org Machine Learning

Linear autoregressive models serve as basic representations of discrete time stochastic processes. Different attempts have been made to provide non-linear versions of the basic autoregressive process, including different versions based on kernel methods. Motivated by the powerful framework of Hilbert space embeddings of distributions, in this paper we apply this methodology for the kernel embedding of an autoregressive process of order $p$. By doing so, we provide a non-linear version of an autoregressive process, that shows increased performance over the linear model in highly complex time series. We use the method proposed for one-step ahead forecasting of different time-series, and compare its performance against other non-linear methods.


Sparse Principal Component Analysis for High Dimensional Vector Autoregressive Models

Wang, Zhaoran, Han, Fang, Liu, Han

arXiv.org Machine Learning

We study sparse principal component analysis for high dimensional vector autoregressive time series under a doubly asymptotic framework, which allows the dimension $d$ to scale with the series length $T$. We treat the transition matrix of time series as a nuisance parameter and directly apply sparse principal component analysis on multivariate time series as if the data are independent. We provide explicit non-asymptotic rates of convergence for leading eigenvector estimation and extend this result to principal subspace estimation. Our analysis illustrates that the spectral norm of the transition matrix plays an essential role in determining the final rates. We also characterize sufficient conditions under which sparse principal component analysis attains the optimal parametric rate. Our theoretical results are backed up by thorough numerical studies.